Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38609331

RESUMO

Natural language processing (NLP) has become an essential technique in various fields, offering a wide range of possibilities for analyzing data and developing diverse NLP tasks. In the biomedical domain, understanding the complex relationships between compounds and proteins is critical, especially in the context of signal transduction and biochemical pathways. Among these relationships, protein-protein interactions (PPIs) are of particular interest, given their potential to trigger a variety of biological reactions. To improve the ability to predict PPI events, we propose the protein event detection dataset (PEDD), which comprises 6823 abstracts, 39 488 sentences and 182 937 gene pairs. Our PEDD dataset has been utilized in the AI CUP Biomedical Paper Analysis competition, where systems are challenged to predict 12 different relation types. In this paper, we review the state-of-the-art relation extraction research and provide an overview of the PEDD's compilation process. Furthermore, we present the results of the PPI extraction competition and evaluate several language models' performances on the PEDD. This paper's outcomes will provide a valuable roadmap for future studies on protein event detection in NLP. By addressing this critical challenge, we hope to enable breakthroughs in drug discovery and enhance our understanding of the molecular mechanisms underlying various diseases.


Assuntos
Descoberta de Drogas , Processamento de Linguagem Natural , Transdução de Sinais
2.
Brief Bioinform ; 21(6): 2219-2238, 2020 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-32602538

RESUMO

Natural language processing (NLP) is widely applied in biological domains to retrieve information from publications. Systems to address numerous applications exist, such as biomedical named entity recognition (BNER), named entity normalization (NEN) and protein-protein interaction extraction (PPIE). High-quality datasets can assist the development of robust and reliable systems; however, due to the endless applications and evolving techniques, the annotations of benchmark datasets may become outdated and inappropriate. In this study, we first review commonlyused BNER datasets and their potential annotation problems such as inconsistency and low portability. Then, we introduce a revised version of the JNLPBA dataset that solves potential problems in the original and use state-of-the-art named entity recognition systems to evaluate its portability to different kinds of biomedical literature, including protein-protein interaction and biology events. Lastly, we introduce an ensembled biomedical entity dataset (EBED) by extending the revised JNLPBA dataset with PubMed Central full-text paragraphs, figure captions and patent abstracts. This EBED is a multi-task dataset that covers annotations including gene, disease and chemical entities. In total, it contains 85000 entity mentions, 25000 entity mentions with database identifiers and 5000 attribute tags. To demonstrate the usage of the EBED, we review the BNER track from the AI CUP Biomedical Paper Analysis challenge. Availability: The revised JNLPBA dataset is available at https://iasl-btm.iis.sinica.edu.tw/BNER/Content/Re vised_JNLPBA.zip. The EBED dataset is available at https://iasl-btm.iis.sinica.edu.tw/BNER/Content/AICUP _EBED_dataset.rar. Contact: Email: thtsai@g.ncu.edu.tw, Tel. 886-3-4227151 ext. 35203, Fax: 886-3-422-2681 Email: hsu@iis.sinica.edu.tw, Tel. 886-2-2788-3799 ext. 2211, Fax: 886-2-2782-4814 Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.


Assuntos
Mineração de Dados , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Benchmarking , Biologia Computacional/métodos , Mineração de Dados/métodos , Bases de Dados Factuais , Redes Neurais de Computação , PubMed , Software , Inquéritos e Questionários
3.
Database (Oxford) ; 20192019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-30809637

RESUMO

The detection of MicroRNA (miRNA) mentions in scientific literature facilitates researchers with the ability to find relevant and appropriate literature based on queries formulated using miRNA information. Considering most published biological studies elaborated on signal transduction pathways or genetic regulatory information in the form of figure captions, the extraction of miRNA from both the main content and figure captions of a manuscript is useful in aggregate analysis and comparative analysis of the studies published. In this study, we present a statistical principle-based miRNA recognition and normalization method to identify miRNAs and link them to the identifiers in the Rfam database. As one of the core components in the text mining pipeline of the database miRTarBase, the proposed method combined the advantages of previous works relying on pattern, dictionary and supervised learning and provided an integrated solution for the problem of miRNA identification. Furthermore, the knowledge learned from the training data was organized in a human-interpretable manner to understand the reason why the system considers a span of text as a miRNA mention, and the represented knowledge can be further complemented by domain experts. We studied the ambiguity level of miRNA nomenclature to connect the miRNA mentions to the Rfam database and evaluated the performance of our approach on two datasets: the BioCreative VI Bio-ID corpus and the miRNA interaction corpus by extending the later corpus with additional Rfam normalization information. Our study highlights and also proposes a better understanding of the challenges associated with miRNA identification and normalization in scientific literature and the research gap that needs to be further explored in prospective studies.


Assuntos
MicroRNAs/metabolismo , Publicações , Estatística como Assunto , Algoritmos , Bases de Dados Genéticas , Internet , MicroRNAs/genética , Anotação de Sequência Molecular
4.
J Cheminform ; 10(1): 64, 2018 Dec 17.
Artigo em Inglês | MEDLINE | ID: mdl-30560325

RESUMO

The large number of chemical and pharmaceutical patents has attracted researchers doing biomedical text mining to extract valuable information such as chemicals, genes and gene products. To facilitate gene and gene product annotations in patents, BioCreative V.5 organized a gene- and protein-related object (GPRO) recognition task, in which participants were assigned to identify GPRO mentions and determine whether they could be linked to their unique biological database records. In this paper, we describe the system constructed for this task. Our system is based on two different NER approaches: the statistical-principle-based approach (SPBA) and conditional random fields (CRF). Therefore, we call our system SPBA-CRF. SPBA is an interpretable machine-learning framework for gene mention recognition. The predictions of SPBA are used as features for our CRF-based GPRO recognizer. The recognizer was developed for identifying chemical mentions in patents, and we adapted it for GPRO recognition. In the BioCreative V.5 GPRO recognition task, SPBA-CRF obtained an F-score of 73.73% on the evaluation metric of GPRO type 1 and an F-score of 78.66% on the evaluation metric of combining GPRO types 1 and 2. Our results show that SPBA trained on an external NER dataset can perform reasonably well on the partial match evaluation metric. Furthermore, SPBA can significantly improve performance of the CRF-based recognizer trained on the GPRO dataset.

5.
Artigo em Inglês | MEDLINE | ID: mdl-27242035

RESUMO

Metastasis is the dissemination of a cancer/tumor from one organ to another, and it is the most dangerous stage during cancer progression, causing more than 90% of cancer deaths. Improving the understanding of the complicated cellular mechanisms underlying metastasis requires investigations of the signaling pathways. To this end, we developed a METastasis (MET) network visualization and curation tool to assist metastasis researchers retrieve network information of interest while browsing through the large volume of studies in PubMed. MET can recognize relations among genes, cancers, tissues and organs of metastasis mentioned in the literature through text-mining techniques, and then produce a visualization of all mined relations in a metastasis network. To facilitate the curation process, MET is developed as a browser extension that allows curators to review and edit concepts and relations related to metastasis directly in PubMed. PubMed users can also view the metastatic networks integrated from the large collection of research papers directly through MET. For the BioCreative 2015 interactive track (IAT), a curation task was proposed to curate metastatic networks among PubMed abstracts. Six curators participated in the proposed task and a post-IAT task, curating 963 unique metastatic relations from 174 PubMed abstracts using MET.Database URL: http://btm.tmu.edu.tw/metastasisway.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , PubMed , Software , Curadoria de Dados , Interface Usuário-Computador
6.
Artigo em Inglês | MEDLINE | ID: mdl-27173520

RESUMO

Biological expression language (BEL) is one of the most popular languages to represent the causal and correlative relationships among biological events. Automatically extracting and representing biomedical events using BEL can help biologists quickly survey and understand relevant literature. Recently, many researchers have shown interest in biomedical event extraction. However, the task is still a challenge for current systems because of the complexity of integrating different information extraction tasks such as named entity recognition (NER), named entity normalization (NEN) and relation extraction into a single system. In this study, we introduce our BelSmile system, which uses a semantic-role-labeling (SRL)-based approach to extract the NEs and events for BEL statements. BelSmile combines our previous NER, NEN and SRL systems. We evaluate BelSmile using the BioCreative V BEL task dataset. Our system achieved an F-score of 27.8%, ∼7% higher than the top BioCreative V system. The three main contributions of this study are (i) an effective pipeline approach to extract BEL statements, and (ii) a syntactic-based labeler to extract subject-verb-object tuples. We also implement a web-based version of BelSmile (iii) that is publicly available at iisrserv.csie.ncu.edu.tw/belsmile.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Processamento de Linguagem Natural , Semântica , Software , Bases de Dados Factuais , Humanos
7.
Clin Infect Dis ; 58(11): 1625-33, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24599769

RESUMO

BACKGROUND: Superinfection with hepatitis D virus (HDV) may increase the risk for hepatitis flares and chronic hepatic complications in patients with chronic hepatitis B virus (HBV) infection. This retrospective observational study aimed to examine the incidence of and factors associated with recent HDV superinfection among individuals coinfected with human immunodeficiency virus (HIV) and HBV. METHOD: Anti-HDV immunoglobulin G (IgG) was sequentially determined in 375 HIV/HBV-coinfected patients to estimate the HDV incidence between 1992 and 2012. Plasma HDV and HBV loads and HBV surface antigen (HBsAg) levels were determined for the HDV seroconverters. A nested case-control study was conducted to identify the associated factors with HDV seroconversion. Phylogenetic analysis was performed using HDV sequences amplified from HDV seroconverters and HDV-seropositive patients at baseline. RESULTS: During 1762.4 person-years of follow-up [PYFU], 16 patients seroconverted for HDV, with an overall incidence rate of 9.07 per 1000 PYFU, which increased from 0 in 1992-2001, to 3.91 in 2002-2006, to 13.26 per 1000 PYFU in 2007-2012 (P < .05). Recent HDV infection was associated with elevated aminotransferase and bilirubin levels and elevated rapid plasma reagin titers. Of the 12 patients with HDV viremia, 2 were infected with genotype 2 and 10 with genotype 4. HBsAg levels remained elevated despite a significant decline of plasma HBV DNA load with combination antiretroviral therapy that contained lamivudine and/or tenofovir. CONCLUSIONS: Our findings show that the incidence of recent HDV infection in HIV/HBV-coinfected patients increased significantly from 1992-2001 to 2007-2011, and was associated with hepatitis flares and syphilis.


Assuntos
Infecções por HIV/complicações , Hepatite B Crônica/complicações , Hepatite D/epidemiologia , Vírus Delta da Hepatite/isolamento & purificação , Adulto , Estudos de Casos e Controles , Estudos de Coortes , DNA Viral/isolamento & purificação , Feminino , Anticorpos Anti-Hepatite/sangue , Vírus da Hepatite B/imunologia , Vírus da Hepatite B/isolamento & purificação , Vírus Delta da Hepatite/classificação , Vírus Delta da Hepatite/imunologia , Humanos , Incidência , Masculino , Pessoa de Meia-Idade , Dados de Sequência Molecular , Filogenia , Plasma/virologia , RNA Viral/genética , RNA Viral/isolamento & purificação , Estudos Retrospectivos , Fatores de Risco , Análise de Sequência de DNA , Carga Viral
9.
Emerg Microbes Infect ; 2(12): e83, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26038447

RESUMO

Human immunodeficiency virus type 1 (HIV-1) circulating recombinant form (CRF) 07_BC has caused serious HIV-1 epidemics among injecting drug users (IDUs) in East Asia. Little is known about the characteristics of the virus and its impact on disease progression among the infected individuals. In this study, we compared immunological progression between 423 IDUs infected with CRF07_BC and 194 men who have sex with men (MSM) with primary subtype B infection, and a representative full-length CRF07_BC molecular clone, pCRF07_BC, was constructed to characterize the virus. We found that IDUs infected with CRF07_BC had significantly slower immunological progression in the Cox proportional hazards model (hazard ratio: 0.30; 95% confidence interval: 0.13-0.69; P=0.004). The constructed recombinant CRF07_BC viruses had a reduced processing of the Gag/Gag-Pol polyproteins, a decreased incorporation of Vpr in the virus particle, tethering of virus particles on the plasma membrane and decreased virus growth kinetics. These phenotypes are related to the unique 7-amino acid deletion in the p6 of CRF07_BC, since complementation of the 7-amino acid in pCRF07_BC could improve the defective phenotypes. In summary, compared with MSM infected with HIV-1 subtype B, IDUs infected with CRF07_BC had slower immunological progression, which is likely correlated with interference of virus particle maturation by the 7-amino acid deletion in p6.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...